Attribution: Javier Luraschi’s talk slides from SDSS 2019
source: https://hadoop4usa.wordpress.com/2012/04/13/scale-out-up/
MapReduce (Hadoop) was the original big kid on the block in terms of scaling out.
Spark’s increases in speed and ease of use means there is now a faster and smoother kid on the block…
source: Zaharia et al. (2016). Apache Spark: A Unified Engine For Big Data Processing
sparklyr + Databricks demoNotebook 1 - Install sparklyr on Databricks cluster
Notebook 2 - Analysis demo